WC 2021

Program at a Glance

10th World Congress in Probability and Statistics

Plenary Wed-1: Laplace Lecture (Tony Cai) Plenary Wed-2: Public Lecture (Young-Han Kim) Plenary Wed-3: Wald Lecture 2 (Martin Barlow) Plenary Wed-4: IMS Medallion Lecture (Gerard Ben Arous)

Invited 05: Recent Advances in Shape Constrained Inference (Organizer: Bodhisattva Sen) Invited 06: Optimization in Statistical Learning (Organizer: Garvesh Raskutti)

Invited 01: Conformal Invariance and Related Topics (Organizer: Hao Wu) Invited 14: Optimal Transport (Organizer: Philippe Rigollet) Invited 21: Probabilistic Theory of Mean Field Games (Organizer: Xin Guo) Invited 35: Stochastic Analysis in Mathematical Finance and Insurance (Organizer: Marie Kratz) Invited 40: KSS Invited Session: Nonparametric and Semi-parametric Approaches in Survival Analysis (Organizer: Woncheol Jang)

Invited 03: Potential Theory for Non-local Operators and Jump Processes (Organizer: Panki Kim) Invited 10: Change-point Problems for Complex Data (Organizer: Claudia Kirch) Invited 12: Statistics for Data with Geometric Structure (Organizer: Sungkyu Jung) Invited 25: Random Graphs (Organizer: Christina Goldschmidt) Invited 36: Problems and Approaches in Multi-Armed Bandits (Organizer: Vianney Perchet)

Organized 09: Random Matrices and Infinite Particle Systems (Organizer: Hirofumi Osada) Organized 18: Advanced Learning Methods for Complex Data Analysis (Organizer: Xinlei Wang) Organized 27: Bayesian Inference for Complex Models (Organizer: Joungyoun Kim) Organized 28: Recent advances in Time Series Analysis (Organizer: Changryoung Baek)

Organized 03: Gaussian Processes (Organizer: Naomi Feldheim) Organized 20: Theories and Applications for Complex Data Analysis (Organizer: Arlene K.H. Kim)

Organized 29: Sequential Analysis and Applications (Organizer: Alexander Tartakovsky)

Contributed 29: Spatial Data Analysis

Contributed 13: Random Structures Contributed 20: Copula Modeling Contributed 26: Multivariate Data Analysis Contributed 31: Statistical Prediction

Contributed 03: Numerical Study of Stochastic Processes / Stochastic Interacting Systems Contributed 08: Study of Various Distributions Contributed 12: Optimal Transport Contributed 27: Machine Learning / Structural Equation

Poster II-1: Poster Session II-1 Poster II-2: Poster Session II-2

Contributed Session (live Q&A at Track 2, 9:30PM KST)

Contributed 13

Random Structures

Conference

9:30 PM — 10:00 PM KST

Local

Jul 21 Wed, 5:30 AM — 6:00 AM PDT

Universal phenomena for random constrained permutations

Jacopo Borga (University of Zurich)

How do local/global constraints affect the limiting shape of random permutations? This is a classical question that has received considerable attention in the last 15 years. In this talk we give an overview of some recent results on this topic, mainly focusing on random pattern-avoiding permutations. We first introduce a notion of scaling limit for permutations, called permutons. Then we present some recent results that highlight certain universal phenomena for permuton limits of various families of pattern-avoiding permutations. These results will lead us to the definition of three remarkable new limiting random permutons: the “biased Brownian separable permuton”, the “Baxter permuton” and the “skew Brownian permuton”. We finally discuss some recent results that show how permuton limits are useful to investigate the behaviour of certain statistics on random pattern-avoiding permutations, such as the length of the longest increasing subsequence.

The scaling limit of the strongly connected components of a uniform directed graph with an i.i.d. degree sequence

Serte Donderwinkel (University of Oxford)

Spherical principal curves

Jongmin Lee (Seoul National University)

This paper presents a new approach for dimension reduction of data observed on spherical surfaces. Several dimension reduction techniques have been developed in recent years for non-Euclidean data analysis. As a pioneer work, Hauberg (2016) attempted to implement principal curves on Riemannian manifolds. However, this approach uses approximations to process data on Riemannian manifolds, resulting in distorted results. This study proposes a new approach to project data onto a continuous curve to construct principal curves on spherical surfaces. Our approach lies in the same line of Hastie and Stuetzle (1989) that proposed principal curves for data on Euclidean space. We further investigate the stationarity of the proposed principal curves that satisfy the self-consistency on spherical surfaces. The results on the real data analysis and simulation examples show promising empirical characteristics of the proposed approach.

Q&A for Contributed Session 13

This talk does not have an abstract.

Session Chair

Namgyu Kang (Korea Institute for Advanced Study)

Contributed 20

Copula Modeling

Conference

9:30 PM — 10:00 PM KST

Local

Jul 21 Wed, 5:30 AM — 6:00 AM PDT

Estimation of multivariate generalized gamma convolutions through Laguerre expansions

Oskar Laverny (Université Lyon 1)

The generalized gamma convolution class of distribution appeared in Thorin's work while looking for the infinite divisibility of the log-Normal and Pareto distributions. Although these distributions have been extensively studied in the univariate case, the multivariate case and the dependence structures that can arise from it have received little interest in the literature. Furthermore, only one projection procedure for the univariate case was recently constructed, and no estimation procedure are available. By expending the densities of multivariate generalized gamma convolutions into a tensorized Laguerre basis, we bridge the gap and provide performant estimations procedures for both the univariate and multivariate cases. We provide some insights about performance of these procedures, and a convergent series for the density of multivariate gamma convolutions, which is shown to be more stable than Moschopoulos's and Mathai's univariate series. We furthermore discuss some examples.

Copula-based Markov zero-inflated count time series models

Mohammed Alqawba (Qassim University)

Count time series data with excess zeros are observed in several applied disciplines. When these zero-inflated counts are sequentially recorded, they might result in serial dependence. Ignoring the zero-inflation and the serial dependence might produce inaccurate results. In this paper, Markov zero-inflated count time series models based on a joint distribution on consecutive observations are proposed. The joint distribution function of the consecutive observations is constructed through copula functions. First and second order Markov chains are considered with the univariate margins of zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB), or zero-inflated Conway-Maxwell-Poisson (ZICMP) distributions. Under the Markov models, bivariate copula functions such as the bivariate Gaussian, Frank, and Gumbel are chosen to construct a bivariate distribution of two consecutive observations. Moreover, the trivariate Gaussian and max-infinitely divisible copula functions are considered to build the joint distribution of three consecutive observations. Likelihood based inference is performed and asymptotic properties are studied. To evaluate the estimation method and the asymptotic results, simulated examples are studied. The proposed class of models are applied to sandstorm counts example. The results suggest that the proposed models have some advantages over some of the models in the literature for modeling zero-inflated count time series data.

Bi-factor and second-order copula models for item response data

Sayed H. Kadhem (University of East Anglia)

Bi-factor and second-order models based on copulas are proposed for item response data, where the items can be split into non-overlapping groups such that there is a homogeneous dependence within each group. Our general models include the Gaussian bi-factor and second-order models as special cases and can lead to more probability in the joint upper or lower tail compared with the Gaussian bi-factor and second-order models. Details on maximum likelihood estimation of parameters for the bi-factor and second-order copula models are given, as well as model selection and goodness-of-fit techniques. Our general methodology is demonstrated with an extensive simulation study and illustrated for the Toronto Alexithymia Scale. Our studies suggest that there can be a substantial improvement over the Gaussian bi-factor and second-order models both conceptually, as the items can have interpretations of latent maxima/minima or mixtures of means in comparison with latent means, and in fit to data.

Q&A for Contributed Session 20

This talk does not have an abstract.

Session Chair

Daewoo Pak (Yonsei University)

Contributed 26

Multivariate Data Analysis

Conference

9:30 PM — 10:00 PM KST

Local

Jul 21 Wed, 5:30 AM — 6:00 AM PDT

A nonparametric test for paired data

Grzegorz Wyłupek (Institute of Mathematics, University of Wrocław)

The paper proposes the weighted Kolmogorov-Smirnov type test for the two-sample problem when the data is paired. We derive the asymptotic distribution of the test statistic under the null model as well as prove the consistency of the related test under the general alternatives. The dependence of the asymptotic distribution of the test statistic from the dependence structure of the data forces the usage of the wild bootstrap technique for the inference. The bootstrap version of the test controls the Type I error under the null model and works very well under the alternative. In the proofs, the main role play the empirical processes' tools.

Inference for Generalized Multivariate Analysis of Variance (GMANOVA) models, under multivariate skew t distribution for modelling skewed and heavy-tailed data

Sayantee Jana (Indian Institute of Management Nagpur)

The most extensively used statistical model in practice, both in research and in practice, is the linear model, due to its simplicity and interpretability. Linear models are preferred, even when approximate, for both univariate and multivariate data, especially since, multivariate skewed models come with their own added complexity. Hence, researchers would not prefer to deliberately add extra layers of complexity by considering non-linear models. Generalized Multivariate Analysis of Variance (GMANOVA) models, is one such linear model useful for the analysis of longitudinal data, which is repeated measurements of a continuous variable, from several individuals across any ordered variable such as time, temperature, pressure etc. It consists of a bilinear structure which allows for comparison across between groups, while maintaining the temporal structure of the data, unlike the Multivariate Analysis of Variance (MANOVA) which does not allow for any temporal ordering or temporal correlation in the model. GMANOVA models are widely used in economics, social and physical sciences, medical research and pharmaceutical studies. However, despite financial data being time-varying, the traditional GMANOVA model has limited to no applications in finance, due to the skewed and volatile nature of such data. This in turn makes financial data the right candidate for Multivariate Skew t (MST) distribution, as it allows for outliers in the data to be modelled, due to its heavy tails. In fact, portfolio analysis including mutual funds, capital asset pricing are all modelled using elliptical distributions, especially multivariate t distribution. The classical GMANOVA model assumes multivariate normality, and hence inferential tools developed for the classical GMANOVA model, may not be appropriate for skewed and heavy-tailed data. In our study, first we explore the sensitivity of inferential tools developed under multivariate normality for skewed and volatile data, and then we develop inferential tools for the GMANOVA model under the MST distribution.

Multiscale representation of directional scattered data: use of anisotropic radial basis functions

Junhyeon Kwon (Seoul National Universtiy)

Spatial inhomogeniety along the one-dimensional curve makes two-dimensional data non-stationary. Curvelet transform, first proposed by Candes and Donoho (1999), is one of the most well-known multiscale methods to represent the directional singularity, but it has a limitation that the data needs to be observed on equally-spaced sites. On the other hand, radial basis function interpolation is widely used to approximate the underlying function from the scattered data. However, the isotropy of the radial basis functions lowers the efficiency of the directional representation. This research proposes a new multiscale method that uses anisotropic radial basis functions to efficiently represent the direction from the noisy scattered data in two-dimensional Euclidean space. Basis functions are orthogonalized across the scales so that each scale can represent global or local directional structure separately. It is shown that the proposed method is remarkable for representing directional scattered data through the numerical experiments. Convergence property and practical issues in implementation are discussed as well.

Q&A for Contributed Session 26

This talk does not have an abstract.

Session Chair

Yunjin Choi (University of Seoul)

Contributed 31

Statistical Prediction

Conference

9:30 PM — 10:00 PM KST

Local

Jul 21 Wed, 5:30 AM — 6:00 AM PDT

Robust geodesic regression

Ha-Young Shin (Seoul National University)

This study explores robust regression for data on Riemannian manifolds. Geodesic regression is the generalization of linear regression to a setting with a manifold-valued dependent variable and one or more real-valued independent variables. The existing work on geodesic regression uses the sum-of-squared errors to find the solution, but as in the classical Euclidean case, the least-squares method is highly sensitive to outliers. In this study, we use M-type estimators, including the L1, Huber and Tukey biweight estimators, to perform robust geodesic regression, and describe how to calculate the tuning parameters for the latter two. We show that, on compact symmetric spaces, all M-type estimators are maximum likelihood estimators, and argue for the overall superiority of the L1 estimator over the L2 and Huber estimators on high-dimensional manifolds and over the Tukey biweight estimator on compact high-dimensional manifolds. A derivation of the Riemannian Gaussian distribution on k-dimensional spheres is also included. Results from numerical examples, including analysis of real neuroimaging data, demonstrate the promising empirical properties of the proposed approach.

A multi-sigmoidal logistic model: statistical analysis and first-passage-time application

Paola Paraggio (Università degli Studi di Salerno (UNISA))

Sigmoidal growth models are widely used in various applied fields, from biology to software reliability and economics. Usually, they describe dynamics in restricted environments.
However, many real phenomena exhibit different phases, each one following a sigmoidal-type pattern. Stimulated by these more complex dynamics, many researchers investigate generalized versions of classical sigmoidal models characterized by several inflection points.
Along these research lines, a generalization of the classical logistic growth model is considered in the present work, introducing in its expression a polynomial term. The model is described by a stochastic differential equation obtained from the deterministic counterpart by adding a multiplicative noise term. The resulting diffusion process, having a multi-sigmoidal mean, may be useful in the description of particular growth dynamics in which the evolution occurs by stages.
The problem of finding the maximum likelihood estimates of the parameters involved in the definition of the process is also addressed. Precisely, the maximization of the likelihood function will be performed by means of meta-heuristic optimization techniques. Moreover, various strategies for the selection of the optimal degree of the polynomial will be provided.
Further, the first-passage-time (FPT) problem is considered: an approximation of its density function will be obtained numerically, by means of the fptdApprox R-package
Finally, some simulated examples are presented.

Statistical inference for functional linear problems

Tim Kutta (Ruhr University Bochum)

In this talk we consider the linear regression model Y=SX+e with functional regressors and responses. This model has attracted much attention in terms of estimation and prediction, but less is known with regard to statistical inference for the unobservable slope operator S. In this talk we discuss new inference tools to detect relevant deviations of the parameter S from a hypothesized slope S'. As modes of comparison we consider the Hilbert-Schmidt norm || S-S'||^2 as well as the prediction error E || SX-S' X ||^2. Our theory is based on the novel technique of "smoothness shifting", which helps us to circumvent existing negative results on the weak convergence of estimators for S. In contrast to all related works the test statistic proposed converges at a rate of N^(-1/2), permitting a fast detection of local alternatives. Furthermore, while most existing procedures rely on i.i.d. observations for Gaussian approximations, our test statistic converges even in the presence of dependence, quantified by phi- or strong mixing. Due to a self-normalization procedure, our approach is user friendly, computationally inexpensive and robust.

Q&A for Contributed Session 31

This talk does not have an abstract.

Session Chair

Changwon Lim (Chung-Ang University)

Program at a Glance

10th World Congress in Probability and Statistics

Contributed Session (live Q&A at Track 2, 9:30PM KST)

Random Structures

Universal phenomena for random constrained permutations

The scaling limit of the strongly connected components of a uniform directed graph with an i.i.d. degree sequence

Spherical principal curves

Q&A for Contributed Session 13

Session Chair

Copula Modeling

Estimation of multivariate generalized gamma convolutions through Laguerre expansions

Copula-based Markov zero-inflated count time series models

Bi-factor and second-order copula models for item response data

Q&A for Contributed Session 20

Session Chair

Multivariate Data Analysis

A nonparametric test for paired data

Inference for Generalized Multivariate Analysis of Variance (GMANOVA) models, under multivariate skew t distribution for modelling skewed and heavy-tailed data

Multiscale representation of directional scattered data: use of anisotropic radial basis functions

Q&A for Contributed Session 26

Session Chair

Statistical Prediction

Robust geodesic regression

A multi-sigmoidal logistic model: statistical analysis and first-passage-time application

Statistical inference for functional linear problems

Q&A for Contributed Session 31

Session Chair